STRGEN.DOC[DOC,LMM] - www.SailDart.org

perm filename STRGEN.DOC[DOC,LMM] blob sn#044087 filedate 1973-05-17 generic text, type T, neo UTF8

APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Exhaustive Generation of Cyclic and Acyclic Isomers.(1,2)

L. Masinter
N.S. Sridharan
J. Lederberg
D.H. Smith

Stanford University
Stanford, California

May, 1973

ABSTRACT. A systematic method of identification of all possible
graph isomers consistent with a given empirical formula is
described. The method, embodied in a computer program,
generates a complete list of isomers. Duplicate structures are
avoided prospectively.

------------------

1) For Part XI see R. Carhart and C. Djerassi, ␈↓↓J. Chem. Soc.,
Perkin II,␈↓β submitted, 1973.

2) Financial support for this work was provided by the National Institutes
of Health (RR 00612-03) and the Advanced Research Projects Agency (SD-183).

Problems of structural isomerism in chemistry have received much

attention. But only occasional inroads have been made toward a

systematic solution of the underlying graph theoretical problems of

structural isomerism. Recently the "boundaries, scope and limits"

(3) of the subject of structural isomerism of acyclic molecules have

been defined by the DENDRAL algorithm (3). This algorithm permits an

enumeration and representation of all possible acyclic molecular

structures with a given molecular formula.

Acyclic molecules represent only a subset of molecular structures,

however, and it may be argued that cyclic structures (including those

posessing acyclic chains) are of more general interest and importance

to modern chemistry. An approach to cyclic structure generation has

appeared in a previous paper in this series (4). That approach,

which operates on a set of previously generated acyclic forms by

labelling hydrogen atoms pairwise and connecting the atoms to which

they are attached with a new bond, has one serious drawback. The

approach cannot make efficient use of the symmetry properties of

cyclic graphs; hence an inordinate amount of computer time must be

------------------
3) J. Lederberg, G.L. Sutherland, B.G. Buchanan, E.A. Feigenbaum,
A.V. Robertson, A.M. Duffield, and C. Djerassi, ␈↓↓J. Amer. Chem.
Soc., ␈↓α91␈↓β, 2973 (1969).

4) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Duffield,
C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and J.
Lederberg, ␈↓↓Org. Mass Spectrom.␈↓β, ␈↓α4␈↓β, 493 (1970).

spent in retrospective checking of each candidate structure with

existing structures to remove duplicates. For this reason, an

alternative approach to construction of cyclic molecules has been

developed. This approach is designed to take advantage of the

underlying graph theoretic considerations, primarily symmetry, to

arrive at a method for more efficient construction of a complete and

irredundant list of isomers for a given empirical formula. Central

to the successful solution of this problem is the generation of all

positional isomers obtained by substitutions on a given ring system.

This topic has received attention for nearly 100 years, with limited

success (5). Its more general ramifications go far beyond organic

chemistry. Graph theoreticians have considered various aspects of

this topic, frequently, but not necessarily, in the context of

organic molecules. Polya has presented a theorem (6) which permits

calculation of the number of structural isomers for a given ring

system. Hill (7a,b) has applied this theorem to enumeration of

isomers of simple ring compounds and Hill (7c) and Taylor (8) have

------------------
5) See, for example, A.C. Lunn and J.K. Senior, ␈↓↓J. Phys. Chem.␈↓β,
␈↓α33␈↓β, 1027 (1929) and references cited therein.

6) a) G. Polya, ␈↓↓Compt. rend.␈↓β, ␈↓α201␈↓β, 1167 (1935);
b) G. Polya, ␈↓↓Helv. Chim. Acta␈↓β, ␈↓α19␈↓β, 22 (1936);
c) G. Polya, ␈↓↓Z. Kryst.␈↓β ␈↓α92␈↓β, 415 (1936);
d) G. Polya, ␈↓↓Acta Math.␈↓β, ␈↓α68␈↓β, 145 (1937).
7) a) T.L. Hill, ␈↓↓J. Phys. Chem.␈↓β, ␈↓α47␈↓β, 253 (1943);
b) T.L. Hill, ␈↓↓ibid.␈↓β, p. 413.
c) T.L. Hill, ␈↓↓J. Chem. Phys.␈↓β, ␈↓α11␈↓β, 294 (1943).

8) W.J. Taylor ␈↓↓J. Chem. Phys.␈↓β, ␈↓α11␈↓β, 532 (1943).

pointed out that Polya's theorem permits enumeration of geometrical

and optical isomers in addition to structural isomers. More

recently, formulae for enumeration of isomers of monocyclic aromatic

compounds based on graph theory, permutation groups and Polya's

theorem have been presented (9). This history of interest and

results provides only marginal benefit to the organic chemist.

Although the number of isomers may be interesting, these methods (5-

9) do not display the structure of each isomer. Also, these methods

do not provide information on the more general case where the ring

system is embedded in a more complex structure. Even for simple

cases the task of specifying each structure by hand, without

duplication, is an onerous one.

␈↓αMETHOD␈↓β

␈↓αOVERVIEW␈↓β

␈↓αFramework.␈↓β The framework for this method is that chemical

structures consist of some combination of acyclic chains and rings or

ring systems (10,11). The problem of construction of acyclic isomers

------------------
9) A.T. Balaban and F. Harary, ␈↓↓Rev. Rumaine de Chimie␈↓β, ␈↓α12␈↓β, 1511
(1967).

10) J. Lederberg, DENDRAL-64, A system for Computer Construction,
Enumeration, and Notation of Organic Molecules, Interim report to
NASA, Parts I, II, and III, 1965.

11) It is assumed that structures are completely connected by
chemical bonds; thus catenates and threaded structures are viewed as
consisting of separate molecules.

(and radicals) has been solved previously (3). If all possible ring

systems can be constructed from all or part of the atoms in the

empirical formula, and all possible acyclic parts are available from

the acyclic generator, the combination of ring systems with acyclic

parts in all unique ways would yield the complete list of isomers.

The method for construction of ring systems is described below. This

description employs some terms which require definition. The

definitions also serve to illustrate the taxonomic principles which

underlie the operation of the cyclic structure generator. The

generator's view of molecular structure differs in some respects from

the chemist's. A chemist, for example, may view structures

possessing the same functional group or ring as related. The

generator works at the more fundamental level of the vertex-graph

(10), as described below.

␈↓αChemical Graph.␈↓β A molecular structure may be viewed as a graph,

termed the ␈↓↓chemical graph␈↓β, or skeleton. A chemical graph consists

of ␈↓↓nodes␈↓β, with associated atom names, and ␈↓↓edges␈↓β, which correspond

to chemical bonds. Consider as an example the substituted piperazine,

␈↓αI␈↓β, whose chemical graph is illustrated in Chart I as ␈↓αII.␈↓β Note

that hydrogen atoms are ignored by convention, while the symbol "U"

is used to specify the unsaturation. The degree (primary, secondary,

...) of a node in the chemical graph has its usual meaning, i.e., the

number of (non-hydrogen) edges connected to it. The valence of each

atom determines its maximum degree in the graph. As usally displayed

by chemists in planar representation, the chemical graph describes

the connectivity rather than the geometric configuration of a

molecular structure.

␈↓αSuperatom.␈↓β In general, a chemical graph can be separated into

cyclic and acyclic parts. Each cyclic structural sub-unit may be

deemed a ␈↓↓superatom␈↓β possessing any number of ␈↓↓free valences␈↓β (12).

The chemical graph ␈↓α2␈↓β arises from a combination of two carbon atoms

with ring-superatom ␈↓α3.␈↓β Ring-superatom ␈↓α3␈↓β possesses the indicated

free valences to which the remaining hydrogen and two methyl radicals

will be attached (Chart I).

␈↓αCiliated Skeletons.␈↓β A ␈↓↓ciliated skeleton␈↓β is a skeleton with free

valences. Ring-superatom ␈↓α3␈↓β arises from the ciliated skeleton ␈↓α4␈↓β

by associating the atom names of eight carbon and two nitrogen atoms

with the skeleton (Chart I).

␈↓αCyclic Skeleton.␈↓β A chemical graph whose nodes are not associated

with atom names and which contain no acyclic parts and no free

------------------
12) A free valence is a bond with an unspecified terminus. Any
substructure, cyclic or not, may be treated as a superatom; however,
the term, in this paper, is generally restricted to cyclic (termed
ring-) superatoms.

valences is termed a ␈↓↓cyclic skeleton.␈↓β Ciliated skeleton ␈↓α4␈↓β arises

from one way of associating sixteen free valences with the nodes on

the cyclic skeleton ␈↓α4␈↓β (Chart I).

␈↓αVertex-Graph.␈↓β Vertex-graphs (10) are cyclic skeletons from which

nodes of degree less than three have been deleted. The vertex-graph

of the cyclic skeleton ␈↓α5␈↓β is the regular trivalent graph (11) of two

nodes, ␈↓α6.␈↓β Note that the remaining nodes of the cyclic skeleton ␈↓α5␈↓β

are of degree two. Removal of these secondary nodes from ␈↓α5␈↓β while

retaining the interconnections of the two tertiary nodes yields ␈↓α6␈↓β

(Chart I).

As an illustration of the variety of structures which may be

constructed from a given vertex-graph and empirical formula, for

example, C H N , consider that graph ␈↓α6␈↓β is the vertex-graph for
10 20 2

all bicyclic ring systems (excluding spiro forms). Cyclic skeletons

␈↓α7␈↓β and ␈↓α8␈↓β (Chart I), for example, may be constructed from eight

secondary nodes and ␈↓α6.␈↓β There are many ways of associating sixteen

free valences with each cyclic skeleton, resulting in a larger number

of ciliated skeletons. For example, ␈↓α9␈↓β and ␈↓α10␈↓β arise from

different allocations of sixteen free valences to ␈↓α5␈↓β (Chart I).

There is only one way to associate eight carbon atoms and two

nitrogen atoms with each ciliated skeleton to yield superatoms (e.g.

THIS PAGE RESERVED FOR CHART I

␈↓α11␈↓β and ␈↓α12␈↓β, Chart I). However, several structures are obtained by

associating the remaining two carbon atoms (in this example) with

each superatom, as an ethyl or two methyl groups. Chemical graphs

␈↓α13␈↓β and ␈↓α14␈↓β, for example, arise from two alternative ways of

associating two methyl groups with superatom ␈↓α3.␈↓β

␈↓αMultiple Bonds.␈↓β For the purposes of this program we adopt the

formalism that all multiple bonds (double, triple, ...) are

considered to be small rings by the program. Previous versions

(acyclic generator) differ from this program in that double and

triple bonds are regarded as specially labelled edges.

␈↓αAIMS␈↓β

The cyclic structure generator must produce a complete list of

structures without duplication. By duplicate structures we mean

structures which are equivalent in some well-defined sense. The class

of isomers generated by the program includes only connectivity

isomers. Transformations allowed under connectivity symmetry preserve

the valence and bond distribution of every atom. Connectivity

symmetry does not consider bond lengths or bond angles. This choice

of symmetry results in construction of a set of topologically unique

isomers. A more detailed discussion of eqivalence is discussed in

Appendix A and in the accompanying paper (13); a discussion of

isomerism and symmetry is presented in Appendix B.
------------------
13) L. Masinter, N.S.Sridharan, Labelling Objects with Symmetry, to
appear.

␈↓αSTRATEGY␈↓β

The strategy behind the cyclic structure generator is strongly tied

to the framework described above. The strategy is summarized in

greatly simplified form in Figure 1. The vertex- graphs from which

structures are constructed can be specified for a given problem by a

series of calculations. Thus Part A of the program (Figure 1)

partitions the pot of atoms in all possible ways; each partition

consists of those atoms assigned to one or more "superatom-pots" and

a "remaining pot." Each superatompot is a collection of atoms from

which all possible, unique ring-superatoms (12) can be constructed

based on the appropriate vertex-graphs (Part B, Fig. 1). Each ring-

superatom will be a ring system in completed structures. The atoms in

the remaining pot will form acyclic parts of the final structures

when combined in all possible, unique ways with the ring-superatoms

from the corresponding initial partition (Part C, Fig. 1).

␈↓αDESCRIPTION␈↓β

We are faced with the difficulty of describing a complex computer

program in the traditional mode of presentation in a scientific

journal. The narrative form is not the ideal medium for this

description; simple examples do not always indicate all essential

aspects of a program. A deeper understanding of a program could be

engendered through the use of a large number of well chosen examples,

but the length of such a presentation would be excessive and would

tax the patience of even the most interested reader.

We are thus aware of the insufficiency of considering only one

example in the following written description. We have adopted the

strategy of presenting essential aspects of the procedure for

structure generation in the main body of the text. Details of the

description which might obscure the principle concepts are placed in

Appendices C and D. Mathematical details are available elsewhere

(14,15). We hope this serves the purpose of providing the casual

reader with a deeper understanding of the method without having to

contend with details which, on the other hand, are important to

others who wish to make use of our approach.

The example chosen to illustrate each step of the method is C H .
6 8

This example does not contain bivalent or trivalent atoms (e.g.,

oxygen and nitrogen, respectively) or atoms of valence greater than

four, nor any univalent atoms other than hydrogen (e.g., chlorine,

fluorine).

␈↓αPartitioning and Labelling.␈↓β The mechanism for structure generation

involves a series of "partitioning" steps followed by a series of

------------------
14a) H. Brown, L. Masinter, and L.Hjelmelend, Constructive Graph
Labelling Using Double Cosets; ␈↓↓Discrete Mathematics␈↓β, in press
b) Stanford Computer Science Memo STAN-CS-72-0318.

15) H. Brown and L. Masinter, An Algorithm for the Generation of
Graphs of Organic Molecules; Discrete Mathematics, submitted.

"labelling" steps. Partitions are made of items which must be

assigned to objects (usually graph structures or parts thereof) as

the molecular structures are built up from the vertex-graphs. The

process by which items are assigned to the graphs is termed labelling

(13,14). Examination of Chart I reveals the different types of items

involved. For example, nodes are partitioned among and labelled upon

the edges of the vertex-graphs to yield the cyclic skeletons. Free

valences are partitioned among and labelled upon the nodes of cyclic

skeletons to yield ciliated skeletons, and so forth.

Partitioning steps in the subsequent discussion are carried out

assuming that objects among which items are partitioned are indist-

inguishable. Distinguishability of objects (edges, nodes, ...) is

specified during labelling and will be discussed in a subsequent

section. The partitioning steps performed by the program are

outlined in Table I. Each step is described in more detail below.

--------------------------------------------------------------------
Table I. Partitioning Steps Performed by the Cyclic Structure
Generator

Step # Partition Among

1 Atoms and Unsaturations Superatompots and
in Empirical Formula Remaining Pot.

2 Free Valence Atoms in Superatompot

3 Secondary Nodes Loops / Non-loops

4 Non-loop Secondary Edges of Graph
Nodes

6 Ring-superatoms and Efferent Links
Remaining Pot (see Appendix D)

--------------------------------------------------------------------

␈↓αPART A. Superatom Partitions.␈↓β
Ring-superatoms are "two-connected" structures, i.e., the ring-

superatom cannot be split into two parts by scission of a single

bond. The atoms in an empirical formula may be distributed among

from one to several such two-connected ring-superatoms. A

distribution which allots atoms to two or more superatompots will

yield (respectively) structures containing two or more ring-

superatoms linked together by single bonds (or acyclic chains) (16).

------------------
16) Chemists are more familiar with terms such as rings or ring
systems. The term two-connected is used here in conjunction with
ring-superatoms for a more precise description. For example,
biphenyl may be viewed as a single ring system or two rings depending
on the chemical context. In this work, however, biphenyl consists of
two ring-superatoms (two phenyl rings) linked by a single bond.

In the generation process, one must find all possible ways of

partitioning the given formula into superatom pots and a remaining

pot, such that molecules can be constructed. The considerations in

forming superatom partitions deal primarily with valence and

unsaturation. This procedure is summarized in Appendix C, Superatom

Partitions. The partitions which result are summarized in Table II.

--------------------------------------------------------------------
Table II. Allowed Partitions of C U Into Superatompots and Remaining
6 3
Pot.

Partition Number of Superatompot Number Remaining
Number Superatompots 1 2 3 Pot

1 1 C U - - -
6 3
2 1 C U - - C
5 3 1
3 1 C U - - C
4 3 2
4 1 C U - - C
3 3 3

5 2 C U C U - -
4 2 2 1
6 2 C U C U - C
3 2 2 1 1
7 2 C U C U - C
2 2 2 1 2

8 2 C U C U - -
4 1 2 2
9 2 C U C U - C
3 1 2 2 1

10 2 C U C U - -
3 2 3 1

11 3 C U C U C U -
2 1 2 1 2 1
--------------------------------------------------------------------

␈↓αPART B. Ring-superatom Construction.␈↓β

Each partition (Table II) must now be treated in turn. The complete

set of ring-superatoms for each superatompot in a given partition

must be constructed. The major steps in the procedure are outlined

in Figure 2.

␈↓αValence List.␈↓β The first step in part B is to strip the superatompot

of atom names, while retaining the valence of each atom. The numbers

of each type of atom are saved for later labelling of the ciliated

skeleton (Chart I). A valence list may then be specified, giving in

order the number of bi-, tri- , tetra- and n-valent nodes which will

be incorporated in the superatom. Thus the superatompot C U is
6 3

transformed into the valence list 0 bivalents, 0 trivalents, 6

tetravalents (0, 0, 6), and C U becomes (0, 0, 4) (Figure 2).
4 2

␈↓αCalculation of Free Valence.␈↓β From the valence list and the

associated unsaturation count the number of free valences of each

superatompot is determined uniquely. (see Calculation of Free

Valence, Appendix C). For C U the free valence is eight (Fig. 2).
6 3

␈↓αPartitioning of Free Valence.␈↓β The free valences are then

partitioned among the nodes in the valence list in all possible,

unique ways.

␈↓αDegree List.␈↓β Each partition of free valences alters the effective

valence of the nodes in the original valence list with respect to the

ring-superatom. In the example, assignment of one or two free

valences to a tetravalent node transforms this node into a tri- or

bivalent node respectively. As the ring-superatom is constructed,

those tetravalent nodes which have been assigned, say, two free

valences, have then only two valences remaining for attachment to the

ring-superatom. These nodes are then of degree (17) two and may be

termed secondary nodes. Thus the partition of free valences

2,2,2,2,0,0 on six tetravalent nodes yields the degree list (4,0,2)

(Fig. 2) as four of the tetravalent nodes receive two free valences

each yielding four nodes of degree two (secondary) and leaving two

nodes of degree four (quaternary). The program keeps track of the

number of free valences assigned to all nodes for use in a subsequent

step.

␈↓αLoops.␈↓β As will be clarified in the subsequent discussion, there are

several general types of ring-superatoms which cannot be constructed

from the vertex-graphs available in the CATALOG (described below).

------------------
17) Use of the term degree in this report refers to the number of
bonds other than free valences, with double bonds being counted
twice. A free valence may or may not eventually be attached to a
hydrogen atom in the final structure.

These are all cases of multiple extended unsaturations either in the

form of double bonds or rings. Examples are the following:

1) bi-, tri-, ... n-cyclics with exocyclic double bonds;
2) some types of SPIRO ring systems;
3) allenes extended by additional double bonds, e.g.,
C=C=C=C

The concept of a loop, consisting of a single unsaturation and at

least one bivalent node must be specified for these cases. Examples

of loops containing one, two and three bivalent nodes are shown in

Chart II. Note that the two remaining "ends" of thhe unsaturation

will yield a "looped structure" when attached to a single node in a

graph (shown as X, Chart II).
---------------------
Chart II
bivalents = 1 2 3

---------------------

The method for specification of loops is discussed in Calculation of

Loops, Appendix C.

␈↓αPartitioning of Secondary Nodes among Loops and Nonloops.␈↓β The

secondary nodes in the degree list are partitioned among the loops

(if any) calculated in the previous step and non-loop secondary nodes

of the graph Aspects of this partitioning step are presented in

Partitioning of Secondary Nodes, Appendix C. Results for the example

are indicated in Figure 2.

␈↓αReduced Degree List.␈↓β This procedure yields the reduced degree list

which contains none of the secondary nodes originally present in the

degree list. Any secondary nodes appearing in the reduced degree list

are termed "special" secondary nodes as these nodes will have loops

attached in subsequent steps.

␈↓αVertex-Graphs.␈↓β The reduced degree lists are used to specify a set

of vertex-graphs for the eventual ring-superatoms. All two-connected

structures can be described by their vertex-graphs, which are, for

most structures, regular trivalent graphs. This concept has been

described in detail by Lederberg (10), who has also presented a

generation and classification scheme for such graphs. Given a set of

all vertex-graphs, the set of all ring-superatoms may be specified.

The vertex-graphs are maintained by the program in the CATALOG.

Catalog entries for regular trivalent graphs possessing two and four

nodes are presented in Table III. This list must be supplemented by

additional vertex-graphs to cover several special cases required for

generation of all structures for the example. These are also

presented in Table III. With the reduced degree list of a

superatompot, the program requests the appropriate CATALOG entries.

In the example (Fig. 2), the reduced degree list (4,0,2) specifies

vertex-graphs containing two quaternary nodes (tetravalent dihedron).

The reduced degree list (2,4,0) specifies regular trivalent graphs of

four nodes, of which there are two: 4AA and 4BB (Table III). When

␈↓↓only␈↓β secondary nodes are present in the reduced degree list, the

graph "Singlering" (Table III) is utilized.

␈↓αInterlude.␈↓β Up to this point the program has effectively decomposed

the problem into a series of subproblems, working down from the total

pot of atoms through a series of partitions and subpartitions to the

set of possible vertex-graphs. In subsequent steps the vertex-graphs

are expanded to the final structures by a series of constructive

graph labellings (Table IV).

--------------------------------------------------------------------
Table IV. The Six Graph Labelling Steps Performed by the Labelling
Algorithm

Labelling Step Function

1 Label Edges of Vertex-Graphs with
Special Secondary Nodes.

2 Label Edges of Resulting Graphs with
Non-Looped Secondary Nodes.

3 Label Loops of Resulting Graphs with
Looped Secondary Nodes.

4 Label Nodes of Skeletons with Free
Valences.

5 Label Nodes of Skeletons with Atom Names.

6 Label Free Valences of Superatoms with
Radicals (see Appendix D).
--------------------------------------------------------------------

␈↓αLabelling Edges of Vertex-Graphs with Special Secondary Nodes.␈↓β

Special secondary nodes are those that will have loops attached. The

specification of the possible attachments of the nodes to the graph

ia a "labelling" procedure. This is the first of six such graph

labelling steps performed by the program. (Table IV). All of these

labelling steps involve the same combinatorial problem, that of

associating a set of ␈↓↓n␈↓β labels, not necessarily distinct, with a set

of objects with arbitrary symmetry (13). The same labelling algorithm

is utilized for each of the six labelling steps. A description of the

underlying mathematics and proof of completeness and irredundancy

appears separately (14).

Some aspects of the first labelling step indicate how equivalent

labellings (which would eventually yield duplicate structures) may be

avoided prospectively, by recognition of the symmetry properties of

the graph; in the first labelling, the vertex-graph. These symmetry

properties are expressed in terms of the permutation group (see

Appendix A and refs. 13 and 14) on the edges of the vertex-graph.

This permutation group, which defines the equivalence of the edges,

may be specified in the CATALOG or, alternatively, calculated as

needed by a separate part of the cyclic structure generator. As

subsequent steps are executed, a new permutation group (e.g., on the

nodes for labelling step four, Table IV) is derived as necessary

(13). Thus, only labellings which result in unique expansions of the

structure are permitted. The reader examining Fig 2 may note that for

this simple example the symmetries of the vertex-graphs and

subsequent skeletons can be discerned easily by eye. For example, all

edges of the tetravalent dihedron are equivalent, as are all the

edges of the regular trivalent graphs 2A and also 4BB. The $3BCB

graph (Table II, Fig. 2) has four equivalent edges and one other

edge, and so forth. In the general case, however, the symmetries of

the vertex-graphs and subsequent expansions thereof are not always

obvious.

With the group on the edges specified, the labelling of the vertex-

graphs with special secondary nodes is carried out. The results of

this procedure for partitions containing loops are indicated in

Figure 2.

␈↓αLabelling with Non-Loop Secondary Nodes.␈↓β The graphs which resulted

from the previous labelling are now labelled with the partitions of

non-loop secondary nodes (Appendix C). Each of the five partitions

for the left-most vertex-graph in Fig. 2 immediately above results in

a single labelling, as all four edges of the graph are equivalent.

When edges are distinguishable there may be several ways to label a

graph with a single partition. There are, for example, for the $3BCB

graph, two ways to label with the partition 3,0,0,0,0, four ways with

the partition 2,1,0,0,0 and three ways with the partition 1,1,1,0,0

(Fig. 2).

␈↓αLabelling with Loop Secondary Nodes.␈↓β There remain unassigned to the

graphs at this point only secondary nodes which were assigned to

loops. These nodes are first partitioned among the loops. (see

Partitioning of Loop Secondary Nodes, Appendix C). For example,

following the path from the degree list (4,0,2) through labelling

with non-loop secondary nodes (Fig. 2), there are two ways of

labelling the two equivalent loops with four secondary nodes. There

is one way to label the two loops of the adjacent graph with three

secondary nodes and one way of labelling the two loops of each of the

two remaining graphs in this section of Figure 2 with two secondary

nodes. In this example (C U ) the loops in every case are equivalent
6 3

or there is only one loop to be labelled. In the general case loops

may not be equivalent, resulting in a greater number of ways to label

loops with a given partition of secondary nodes.

␈↓αCyclic Skeletons.␈↓β The previous labelling steps specified the number

of secondary nodes on each edge of and loop attached to the vertex-

graphs. All atoms in the original superatompot are thus accounted

for. A representation of the result is the cyclic skeleton, where

nodes and their connections to one another are specified. (These

skeletons begin to resemble conventional chemical structures.)

␈↓αLabelling with Free Valences.␈↓β The nodes in a cyclic skeleton are

then labelled with free valences, yielding ciliated skeletons. This

labelling is trivial in the example, as all atoms are of the same

valence (four) (Figure 2). Free valence labelling is performed with

knowledge of how many atoms of each valence were present in the

original superatompot, but independent of the ␈↓↓identities␈↓β of the

atoms. The combinatorial complexity of this labelling problem follows

from the possible occurance of atoms with differing valencies. In the

general case there may be several ways to perform this labelling on a

single cyclic skeleton, whereas in the C U example there is only one
6 3
way.

␈↓αLabelling with Atom Names.␈↓β The nodes of a ciliated skeleton are

then labelled with atom names to yield the ring-superatom(s). Again

this labelling is trivial in the example, as only one type of atom is

present (carbon), yielding in each case only a single superatom (Fig.

2). If there is more than one type of atom with the same valence

(e.g., silicon and carbon), the labelling problem is more complex.

Each node of appropriate valence may be labelled with either type of

atom. Duplicate structures are avoided by calculations involving the

group pertaining to the set of nodes of equal valence.

␈↓αPART C. Acyclic Generator.␈↓β

The superatom partition expanded in the example had no atoms assigned

to acyclic chains (remaining pot). The set of superatoms on

completion of Part B, above, thus yields the set of 36 structures on

placement of a hydrogen atom on each free valence (Fig. 2). If the

superatom partition (partitions 2-11, Table II) contained more than

one superatompot or any atoms in the remaining pot, the acyclic

generator must be used to connect the segments of the structure in

all ways. This procedure is described in detail in Appendix D.

␈↓αDISCUSSION␈↓β

␈↓αCompletion of␈↓β C H . The example (Fig. 2) has considered only
6 8

expansion of a single superatom partition. It might be instructive

for the reader to attempt to generate all, or at least the remaining,

structures for C H . The number of solutions is presented in a
6 8

subsequent section. If the algorithm as outlined in Figure 2 is

followed, it is suggested that the initial superatom partitions in

Table II be examined carefully. These partitions yield some

indication of the types of structures which will result from each

partition. For example, partition 4, C U in a single superatompot,
3 3

plus three carbons in the remaining pot, should yield all structures

containing a three-membered ring possessing two double bonds or a

triple bond. As there are only two free valences, the remaining

atoms can be in a single chain (as a propyl or iso-propyl radical) or

as a methyl and an ethyl group, but not as three methyl groups.

␈↓αCompleteness and Irredundancy. Although a mathematical proof␈↓β of the

completeness and irredundancy of the method exists (15), there is no

guarantee that the implementation of the algorithm in a computer

program maintains these desired characteristics. Confidence in the

completeness and irredundancy of a program of this complexity can be

engendered in the following ways:

1) Verification of the program's performance by another,

completely independent approach. An independent method has been

developed which enumerates, but does not construct all isomers of

compositions containing C,H,N, and O (18). It is interesting that

the program for simple counting of the solutions is significantly

slower than construction of all of the solutions, despite some effort

to improve the efficiency of the former program. Thus, due to

limitations of computer time, we have been limited to compositions

containing only 5 or fewer non-hydrogen atoms. For these cases,

however, the numbers of isomers obtained by both programs agree.

2) Testing by manual generation of structures. Several

chemists, all without knowledge of the algorithm described above,

have been given several test cases, including C U , from which
6 3

structures were generated by hand. Familiarity with chemistry is no

guarantee of success, as evidenced by the performance of three

chemists for the superficially simple case of C U (C H , Table V).
6 3 6 8

------------------
18) L. Masinter, Enumeration of Graphs of Molecules; in preparation.

---------------------------------------------------------------
*
Table V. Performance of Three Chemists in Manual Generation
of Isomers of C H (C U ). There are 159 Isomers.
6 8 6 3

Number Generated Type of Error

Chemist 1 161 4 duplicates; 4 omissions
2 with 7 carbon atoms.

Chemist 2 168 16 duplicates; 7 omission

Chemist 3 160 2 duplicates; 1 omission

* One PhD and two graduate students.
---------------------------------------------------------------

This example indicates that for more than very trivial cases,

it is extremely difficult to avoid duplicates (tricyclics, for

example, are difficult to visualize when testing for duplicates) and

omissions. Omissions appear to result from both carelessness and

forgetting ring systems that are implausible or unfamiliar. The

program seems better at testing the chemist than vice versa. In

every instance of manual structure generation, no one has been able

to construct a legal structure that the program failed to construct.

No one has been able to detect an instance of duplication by the

program. This performance builds some confidence, but manual

verification of more complicated cases is extremely tedious and

difficult. Isomers for many empirical formulae have been generated,

and some results are tabulated in Table VI. The choice of examples

has been motivated by a desire to test all parts of the program where

errors may exist while keeping the number of isomers small enough to

allow verification. In this manner all obvious sources of error have

been checked, for example, construction of loops on loops, multiple

types of atoms of the same valence (e.g. Cl, Br, I) and partitions

containing atoms of several different valences including penta- and

hexavalent atoms.

3) Varying the order of generation. The structure of the

program permits additional tests by doing some operations in a

different order. For example, one variation allowed is to leave

hydrogens associated with the atoms in each partition rather than to

strip them away initially and place them on the remaining free

valences in the last step. Each such test has resulted in the same

set of isomers.

4) Using Polya enumeration (6) at the various labelling steps

of the procedure to verify the correctness of sub-parts of the

program. Using various combinatorial formulae, one can insure that

the results of at least parts of the program are consistant with

independant calculations. This approach was used extensively in the

development of the labelling algorithm.

In summary, the verification procedures utilized have all indicated

absence of errors in the computer implementation of the algorithm.

Also, there is no clear reason why generation of larger sets of

isomers should not also proceed correctly. The final verdict

however, must await development of new mathematical tools for

verification by enumeration (see above) or an alternative algorithm.

--------------------------------------------------------------------
Table VI. The Number of Isomers for Several Empirical Formulae

Empirical Example Number of Isomers Manually Verified?
Formula Compound

C H benzene 217 yes
6 6

C H 1,3-cyclohexadiene 159 yes
6 8

C H cyclohexene 77 yes
6 10

C H cyclohexane 25 yes
6 12

C H hexane 5 yes
6 14

C H O phenol 2237
6 6

C H O cyclohexanone 747 no
6 10

C H O 2-hexanone 211 yes
6 12

C H N pyrazole 155 no
3 4 2

C H N 2-pyrazoline 136 yes
3 6 2

C H N tetrahydropyrazole 62 no
3 8 2

C H N propylenediamine 14 yes
3 10 2

C H P (pentavalent P) 110 no
4 9 1
--------------------------------------------------------------------

␈↓αConstraints.␈↓β The structure generator is designed to produce a list

of all possible graph connectivity isomers (Appendix B). This list

contains many structures whose existence seems unlikely based on

present chemical knowledge. In addition, the program may be called

on to generate possible structures for an unknown in the presence of

a body of data on the unknown which specify various features, (e.g.,

functional groups) of the molecule. In such instances mechanisms are

required for constraining the generator to produce only structures

conforming to specified rules. The implementation of the acyclic

generator possessed such a mechanism in the form of GOODLIST (desired

features) and BADLIST (unwanted features) (3) which could be utilized

during the course of structure generation.

The cyclic generator is less tractable. As in prospective avoidance

of duplicate structures, it is important that unwanted structures, or

portions thereof, be filtered out as early in the generation process

as possible. It is relatively easy to specify certain general types

of constraints in chemical terms, for example, the number of each of

various types of rings or ring systems in the final structure, ring

fusions, functional groups, sub-structures and so forth. It is not

always so easy to devise an efficient scheme for utilizing a

constraint in the algorithm, however. As seen in the above example

(Fig. 2) the expanded superatom partition results in what would be

viewed by the chemist as several very different ring systems.

The design of the program facilitates some types of constraints. For

example, the program may be entered at the level of combining

superatoms to generate structures from a set of known sub-structures.

If additional atoms are present in an unknown configuration, they can

be treated as a separate generation problem, the results of which are

finally combined in all ways with the known superatoms. This

approach will not form additional two-connected structures, however.

Constraints which disallow an entire partition may be easily

included. For example, it is possible to generate only open chain

isomers by "turning off" the appropriate initial superatom partition.

Much additional work remains, however, before a reasonably complete

set of constraints can be included. The implementation of each type

of constraint must be examined and tested in detail to ensure that

the generator remains thorough and irredundant.

␈↓αCONCLUSIONS␈↓β

The boundaries, scope and limitations of chemical structure can now
be specified.

␈↓αACKNOWLEDGEMENTS␈↓β

We wish to express our thanks to Professor Harold Brown, (currently

at Ohio State University) who provided valuable assistance in the

formulation of the mathematical principles underlying the structure

generator.

␈↓αAppendix A. Equivalence Classes and Finite Permutation Groups.␈↓β
Two members of a set of possible isomers may be defined to be
equivalent if a specified transformation of one member causes it to
be superimposable upon another member of the set. For example, there
are fifteen possible ways of attaching two chlorine and four hydrogen
atoms to a benzene ring (Chart III).
-----------------------------------------------------------------
Chart III

-----------------------------------------------------------------

If rotations by multiples of 60 degrees are specified as allowed
transformations, the fifteen structures fall logically into three
classes, termed "equivalence classes" (Scheme 4). Within each
equivalence class structures may be made superimposable by the
rotational transformation. If one element (in this case a molecular
structure) is chosen from each equivalence class, the complete set of

possible structures is determined, without duplication. It is the
task of the labelling algorithm to produce one and only one graph
labelling corresponding to one member of each equivalence class.

The set of transformations which define an equivalence class is
termed a "finite permutation group." This permutation group may be
calculated based on the symmetry properties of a graph (or chemical
structure in the example of Scheme 4). This calculation provides the
mechanism for prospective avoidance of duplication. These procedures
are described more fully in the accompanying paper (13).

␈↓αAppendix B. Isomerism and Symmetry.␈↓β

Appendix A introduced the concept of equivalence classes and finite
permutation groups. The selection of transformation (Appendix A)
directs the calculation of the permutation group and thus defines the
equivalence classes. Different types of transformation may be
allowed depending on the symmetry properties of the class of isomers
considered. This Appendix discusses several of the possible types of
isomerism, most of which are familiar to chemists. The reader
seeking a more thorough discussion of some types of isomerism
discussed below is referred to an exposition of molecular symmetry in
the context of chemistry and mathematics (19).

Isomers are most often defined as chemical structures possessing the
same empirical formula. Different concepts of symmetry give rise to
different classes of isomers, some of which are described below.

␈↓αPermutational Isomers.␈↓β Permutational isomers are isomers which have
in common the same skeleton and set of ligands. They differ in the
distribution of ligands about the skeleton. Gillespie et al. (20)
and Klemperer (21) have used the concept of permutational isomers to
probe into unimolecular rearrangement or isomerization reactions.

␈↓αStereoisomers.␈↓β Ugi et al. (19) have defined the "chemical
constitution" of an atom to be its bonds and bonded neighbors. Those
permutational isomers which differ only by permutations of ligands at
constitutionally equivalent positions form the class of
stereoisomers.

␈↓αIsomers Under Rigid Molecular Symmetry.␈↓β If one conceives of
molecular structures as having rigid skeletons, the physical
rotational (three dimensional) symmetries and transformations may be
readily defined. Each transformation causes each atom (and bond) to

------------------
19) I. Ugi, D. Marquarding, H. Klusacek, G. Gokel, and P. Gillespie,
␈↓↓Angew. Chem. internat. Edit.␈↓β, ␈↓α9␈↓β, 703 (1970).

20) P. Gillespie, P. Hoffman, H. Klusacek, D. Marquarding, S.
Pfohl, F. Ramirez, E. A. Tsolis, and I. Ugi, ␈↓↓Angew. Chem.␈↓β
␈↓↓internat. Edit.␈↓β, ␈↓α10␈↓β, 687 (1971).

21) a) W. G. Klemperer, ␈↓↓J. Amer. Chem. Soc.␈↓β, ␈↓α94␈↓β, 6940 (1972);
b) W. G. Klemperer, ␈↓↓ibid␈↓β, p. 8360.

occupy the position of another or same atom (and bond) so that the
rotated structure can physically occupy its former position and at
the same time be indistinguishable from it in any way. This is the
most familiar form of symmetry. Under this type of symmetry
conformers are distinguishable and belong in distinct equivalence
classes. Every transformation is orthogonal and preserves bond
angles and bond lengths as well as maintaining true chirality.

If one allows other orthogonal transformations that alter chiral
properties of structures, equivalence classes result that treat both
the left-handed and right-handed forms of chiral molecules to be the
"same". Thus a "mirror image" transformation when suitably defined
permits the left-handed form to exactly superimpose the right-handed
form and vice-versa.

␈↓αIsomers Under Total Molecular Symmetry.␈↓β If in addition to the above
mentioned rigid molecular transformations one recognizes the
flexional movements of a nonrigid skeleton, a dynamic symmetry group
may be defined. Under this definition, different conformers now are
grouped together. Thus the "chair" and "boat" conformations of
cyclohexane belong to the same equivalence class under dynamic
symmetry. The permutation group of skeletal flexibility is
computable separately and independently of rigid molecular symmetry.
One can then view total molecular symmetry as the product of the two
finite permutation groups.

␈↓αIsomers Under Connectivity Symmetry.␈↓β The concept of connectivity
symmetry was introduced previously (METHOD section). Every
permutation of atoms and bonds onto themselves is a symmetry
transformation for connectivity symmetry if,

a) each atom is mapped into another of like species, e.g., N to
N, C to C, O to O, and

b) for every pair of atoms, the connectivity (none, single,
double , triple, ...) is preserved in the mapping, i.e. the the
connectivity of the two atoms is identical to the connectivity
of the atoms they are mapped into.

One can readily recognize that transformations as defined
automatically preserve the valence and bond distribution of every

atom. It is very probable that readers accustomed to three
dimensional rotational and reflectional symmetries will tend to
equate them with the symmetries of connectivity. It is emphasized
again that connectivity symmetry does not consider bond lengths or
bond angles, and it includes certain transformations that are
conceivable but have no physical interpretation save that of
permuting the atoms and bonds.

␈↓αAppendix C␈↓β

␈↓αSuperatom ParAtitions.␈↓β The first step is to replace the hydrogen
count with the degree of unsaturation. The number of unsaturations
(rings plus double bonds) is determined from the empirical formula in
the normal way, as given in equation 1.
n
U = 1/2 (2+∃ (i-2)a ) (1)
i=1 i

U = unsaturation
i = valence
n = maximum valence in composition
a = number of atoms with valence i
i

If the unsaturation count is zero, the formula is passed immediately
to the acyclic generator. Specifying the unsaturations as U's, the
example C H becomes C U (hydrogen atoms are omitted by
6 8 6 3
convention).

There are several rules which are used during the partitioning
scheme, as follows:

I. The resulting formula is stripped of other univalent atoms (e.g.,
chlorine) as such atoms cannot be part of two-connected ring-
superatoms. These univalent atoms are relegated to the pot of
remaining atoms.

II. The remaining pot in a given partition (those atoms not
allocated to superatompots) can contain NO unsaturations. Thus
ALL rings and/or double bonds will be generated from the
superatompots.

III. It follows that every superatompot in the partition must
contain at least two atoms of valence two or higher plus at
least one unsaturation. If there are no unsaturations then no
rings could be built. In addition, an unsaturation cannot be
placed on a single atom. This rule defines the minimum number
of atoms and unsaturations in a superatompot.

IV. The maximum number of unsaturations in a superatompot is given
by Equation 2. Superatoms must possess at least one free
valence (12), so that superatompots with no free valences, e.g.,
O U or C U , are not allowed, unless the superatompot contains
2 1 2 3
all atoms in the emperical formula (since no univalents, and
thus no hydrogens, are allowed in a superatompot, this is indeed
a rare occurance.)
n
U = 1/2 (∃ (i-2)a ) (2)
max i=3 i

U = maximum unsaturation of a superatompot
max
i = valence
a = number of atoms with valence i
i

V. The maximum number of superatompots for a given formula is
defined by equation 3.
n
S = 1/2∃ a (3)
max i=2 i

n = maximum valence in composition
S = maximum number of superatompots in a superatom partition
max

note: the summation is over all atoms of valence >2;
univalents are not considered.

Rules I-V define the allowed partitions of a group of atoms into
superatompots. These rules do not, however, prevent generation of
equivalent partitions, which would eventually result in duplicate
structures. Defining a canonical ordering scheme to govern
partitioning prevents equivalent partitions. One such canonical
ordering is as follows:

␈↓αCanonical Ordering for Partitioning.␈↓β

a. Partition in order of increasing number of superatompots.

b. For each entry in each part of (a), partition in order of
decreasing size of superatompot by allocation of atoms one at a
time to the remaining pot.

c. Each individual partition containing two or more
superatompots must be in order of equal or decreasing size of
the superatompot. In other words, the number of atoms and
unsaturations in superatompot n+1 must be equal to or less than
the number in superatompart n. The program notes the equality
of superatompots in a partition to avoid repetition.

The application of rules I-V is best illustrated through reference to
the example of C U . The maximum number of superatompots for this
6 3
example is three (Equation 3). There is one way to partition C U
6 3
into one superatompot with no remaining pot, partition 1, Table II.
Subsequent assignment of carbon atoms one at a time to the remaining
pot results in partitions 2-4, Table II. The next partition
following the sequence 1-4 would be C U with C assigned to the
2 3 4
remaining pot. This partition is forbidden as C U has no free
2 3
valences. The three ways to partition C U into two superatompots
6 3
are indicated along with the corresponding partitions following
assignment of atoms to the remaining pot, as partitions 5-10, Table
II. There is only one unique way of partitioning C U into three
6 3
superatompots, partition 11, Table II.

␈↓αCalculation of Free Valence.␈↓β The expression for the free valence of
a superatompot is given by equation 4.
n
FV = (2 +∃ (i-2)a )-2U (4)
i=3 i

U = unsaturation of superatompot
i = valence
n = maximum valence in composition
a = number of atoms with valence i
i
FV = free valence

␈↓αPartitioning of Free Valence.␈↓β Because ring-superatoms are two-
connected structures two valences of each atom of a superatompot must
be used to connect the atom to the ring-superatom. Thus no free
valences can be assigned to bivalent nodes in the valence list, a
maximum of one to each trivalent, a maximum of two to each
tetravalent, and so forth. The example (Fig. 2) is further
simplified in that there are only tetravalent nodes in the valence
list. Inclusion of trivalent nodes (e.g., nitrogen atoms) merely
extends the number of possible partitions. The free valences are
partitioned among the tetravalent nodes in all ways, as illustrated
in Figure 2. It is important to note that removal of atom names
makes all n-valent (n=2 or 3 or ...) nodes in the valence list
equivalent at this stage. Thus the partitions (of eight free
valences among six tetravalent nodes) 222200, 222020, 222002, ......,
002222 are all equivalent. Only one of these partitions is
considered to avoid eventual duplication of structures.

␈↓αCalculation of Loops.␈↓β There are several rules which must be
followed in consideration of loop assignment to ring-superatoms. The
minimum (MINLOOPS) and maximum (MAXLOOPS) numbers of loops for a
given valence list are designated by equations 5 and 6.
1 n
MINLOOPS = max{0 ,a + -(2j -∃ ja )} (5)
2 2 max i=2 j

n j-2
MAXLOOPS = min{a , ∃ (---a } (6)
2 j=4 2 j

MINLOOPS = minimum number of loops.
MAXLOOPS = maximum number of loops.
a = number of secondary nodes in degree list.
2
j = degree of highest degree item in degree list.
max
j = degree.
n = highest degree in list.
a = number of nodes with degree j.
j

The form of the equations results from the following considerations:

1) Only secondary nodes may be assigned to loops. Nodes of

higher degree will always be in the non-loop portion of the
ring-superatom.

2) A loop, by definition, must be attached by two bonds to a
single node in the resulting ring-superatom. The loop cannot
be attached through the free valences. Thus the degree list
must possess a sufficient number of quaternary or higher degree
nodes to support the loop(s).

3) Each loop must have at least one secondary node, which is
the reason MAXLOOPS is restricted to be at most the number of
secondary nodes in the degree list (Equation 6).

4) There must be available one unsaturation for each loop
(this is implicit in the calculation of MINLOOPS and MAXLOOPS)
as each loop effectively forms a new ring.

␈↓αPartitioning of Secondary Nodes among Loops/Non-Loops.␈↓β For each of
the possible numbers of loops (0,1, ...) the secondary nodes are
removed from the degree list and partitioned among the loops,
remembering that the loops are at present indistinguishable and each
loop must receive at least one secondary node. In the example (Fig.
2), starting with the degree list (4,0,2), there are three ways of
partitioning the four secondary nodes among two loops and the
remaining non-looped portion. Removal of the four secondary nodes
from the degree list and assignment of two, three or four of them to
two loops results in the list specified in Figure 2 as the "reduced
degree list". Specification of two loops transforms the two
quaternary nodes in the degree list into two secondary nodes. This
results from the fact that two valences of a quaternary or higher
degree node must be used to support each loop. These are "special"
secondary (or higher, for atoms with valence > 4) nodes, however, as
these particular nodes will have loops attached as the structure is
built up. Thus, in the example, any secondary nodes which are found
in the reduced degree list will have a loop attached in a subsequent
step. The degree list (4,0,2) thus becomes the reduced degree list
(2,0,0) in the partition specifying two loops (Fig. 2). Similarly,
the partition of one loop for the degree list (3,2,1) results in a
reduced degree list of (1,2,0) with the three original secondary
nodes partitioned among loop and non-loop portions (Figure 2).

If, after the first, second, ... nth loop partition, there remain one

or more quaternary or higher degree nodes in the reduced degree list,
the list must be tested again for the possibility of additional
loops. Each loop partition will result in an additional set of
structures. The second pass will yield those structures possessing
loops on loops, and so forth. One such superatom which would be
generated in this manner from a composition of (at least) C U is 15.
6 5

C=C=C=C=C=C
␈↓α15␈↓β

␈↓αPartitioning of Non-Loop Secondary Nodes Among Edges.␈↓β The secondary
nodes which were not assigned to loops ("non-looped secondary nodes")
are partitioned among the edges of the graphs after labelling with
loops. Loops are not counted as edges. There are, for example, five
ways to partition four non-loop secondary nodes among the edges of
the vertex-graph possessing two quaternary nodes (Fig. 2).

␈↓αPartitioning of Loop Secondary Nodes among Loops.␈↓β This partitioning
step is carried out assuming indistinguisability of the loops. Each
loop must recieve at least one secondary node, which limits the
number of possible partitions. Results are presented in Figure 2.

␈↓αAppendix D - Acyclic generator␈↓β

A method of construction of structures similar to the generation of
acyclic molecules is utilized to join multiple ring-superatoms and
remaining atoms. The DENDRAL algorithm for construction of acyclic
molecules (3,10a,24) relied on the existence of a unique central atom
(or bond) to every molecule. The present acyclic generator uses the
same idea. The present algorithm, though simpler in not having to to
treat interconnection of atoms or ring-superatoms through multiple
bonds, is more complex because of the necessity to deal with the
symmetries of the ring-superatoms.

␈↓αD1. Method for the case with even number of total atoms.␈↓β

The superatom partition C U /C U /-/C (partition 7, Table II and
2 2 2 1 2
Figure 2) will be used here to illustrate this procedure. The
superatompots C U and C U have exactly one possible ring-superatom
2 2 2 1
for each (see Table VII).

--------------------------------------------------------------------
Table VII.
Superatompot Superatom

C U -C≡≡C-
2 2
C U >C==C<
2 1

--------------------------------------------------------------------

Thus acyclic structures are to be built with -C≡≡C- , >C==C< and two
C's.

There are an even number of atoms and ring-superatoms. The
structures to be generated fall into two categories: (a) those with
bond centroid; (b) those with an atom centroid.

------------------
24) See B. G. Buchanan, A. M. Duffield, and A. V. Robertson, in "Mass
spectrometry, Techniques and Applications," G. W. A. Milne, ed., John
Wiley and Sons, Inc., 1971, p. 121.

␈↓αCategory A. BOND CENTROID (see Fig. 3)␈↓β

␈↓αStep 1. Partition into Two Parts.␈↓β

The atoms and ring-superatoms in the list of superatoms are
partitioned into two parts, with each part having exactly half the
total number of items. Each atom or ring-superatom is a single item.
Each part has to satisfy equation 7, called the Restriction on
Univalents.

␈↓αRestriction on Univalents:␈↓β
n
1+∃ (i-2) a ≥ 0 (7)
i=1 i

i = valence.
a = number of atoms or superatoms of valence i.
i
n = maximum valence in composition.

There are two ways of partitioning the four items into two parts
(Fig. 3). The restriction on univalents is satisfied in each case.
The restriction will disallow certain partitions that have "too many"
univalents other than hydrogens and therefore is essential only in
partitioning compositions that contain any number of non-hydrogen
univalents.

␈↓αStep 2. Construct Radicals from Each Part.␈↓β

Using a procedure described in Section D3, radicals are constructed
from each part in each partition. The result of application of this
procedure to the example as shown in Table VIII.

--------------------------------------------------------------------
Table VIII. Radicals Constructable from Given Parts

Part | Radicals
---------------------+------------------------------

-C≡≡C- , >C==C< -> -C≡≡C-CH=CH2

-CH=CH-C≡≡CH

-C-C≡CH
"
CH2

---------------------+------------------------------

C2 -> -CH2-CH3

---------------------+------------------------------

-C≡≡C- , C -> -C≡≡C-CH3

-CH2-C≡≡CH

---------------------+------------------------------

>C==C< , C -> -CH==CH-CH3

-C-CH3
"
CH2

-CH2-CH==CH2
--------------------------------------------------------------------

␈↓αStep 3. Form Molecules From Radicals.␈↓β

The radicals are combined in unique pairs, within each initial
partition. Each pair gives rise to a unique molecule, for each of
which the centroid is a bond. There are nine such molecules for the
example chosen (Fig. 3).

␈↓αCategory B. ATOM CENTROID (see Fig. 4).␈↓β

␈↓αStep 1. Selection of Centroid.␈↓β
One must consider every unique atom or ring-superatom that has a free
valence of three or higher as an atom centroid. In the example, of
three candidates available: -C≡≡C- , >C==C< and C, the first is not
chosen for it has a free valence of only two.

␈↓αStep 2. Partition the Rest of the Atoms.␈↓β
The atom or ring-superatom chosen for the center is removed from the
set and the rest are partitioned into a number of parts less than or
equal to the valence of the central atom. Each part must have less
than half the total number of items being partitioned (again a ring-
superatom is a single item). Each part must satisfy the restriction
on univalents (equation 7).

Thus, for the case where a carbon is the center, from one to four
partitions are generated with the condition that each part has at
most 1 atom (the number of atoms is ≤(4-1)/2). No valid partitions
are possible for one, two or four parts. There is exactly one
partition for three parts, i.e., one in each. The partitions are
shown in Figure 4.

␈↓αStep 3. Construct Radicals.␈↓β
Once again, using the procedure described in Section D3, radicals are
constructed for each part in each partition. For example, the
partition -C≡≡C- gives rise to exactly one possible radical -C≡≡CH
(Fig. 4).

␈↓αStep 4. Combine Radicals.␈↓β
Although in the example shown every part generates only one radical,
in the general case there will be many radicals for each part. If
so, the radicals must be combined to give all unique combinations of
radicals within each part.

␈↓αStep 5. Form Molecules from Central Atom and Radicals.␈↓β
If the central atom is not a ring-superatom but is a simple atom,
then each combination of radicals derived in Step 4 defines a single
molecule that is unique. Thus for example when C is chosen as the
centroid, step 4 gives one combination of radicals which determines a
single molecule when connected to the central C (see Figure 4).

If the central atom is a ring-superatom and the valences of the ring-
superatom are not identical then different ways of distributing the
radicals around the center may yield different molecules. Labelling
of the free valences of the central ring-superatom with radicals
treated as labels (supplemented with adequate number of hydrogens to
make up the total free valence of the ring-superatom) generates a
complete and irredundant list of molecules. Thus >C==C< is labelled
with the label set:
one of -C≡≡CH , two of -CH3 , and one of -H.

There are two unique labellings as shown in Figure 4.

␈↓αD2. Method for odd number of total atoms.␈↓β

With an odd number of total atoms, no structures can be generated
with a bond centroid. Only atom centroid is possible. However, it
is possible for structures to be built with a bivalent atom at the
centroid. Thus the procedure outlined in Category B above is
followed, in this case also allowing a bivalent atom as the centroid.

␈↓αD3. Construction of Radicals␈↓β

The goal of this procedure is to generate all radicals from a list of
atoms and ring-superatoms. A radical is defined to be an atom or
superatom with a single free valence. When a composition of atoms
and ring-superatoms is presented, from which radicals are to be
constructed, two special cases are recognized.

␈↓αSpecial Case 1. Only One Atom in Composition.␈↓β
When only one atom which is not a ring-superatom is in the list, only
one radical is possible. For example, with one C, the radical -CH3
is the only possibility.

␈↓αCase 2. Only One Ring-superatom in Composition.␈↓β

In this case, depending upon the symmetry of the ring-superatom,
several radicals may be possible. This is determined by labelling
the free valences of the ring-superatom with one label of a special
type, a "radical-valence".
Example: A composition consists of one ring-superatom, ␈↓α16.␈↓β

C-
/"
>C< "
\"
C-

␈↓α16␈↓β

Two radicals result from labelling with one radical valence.

CH C-
/" /"
-CH< " CH2< "
\" \"
CH CH

␈↓α17 18␈↓β

␈↓αGENERAL CASE␈↓β

Radicals have uniquely defined centroids as well(10a,24). The
centroid is always an atom of valence two or higher. The steps for
construction of radicals are as follows.

␈↓αStep 1. Selection of Atom Centroid.␈↓β

Any bivalent or higher valent atom or ring-superatom is a valid
candidate to be the centroid of a radical. Thus, for example, for
the composition -C≡C-, >C=C< (see part 1 in Figure 3) both are valid
centroids (Figure 5).

␈↓αStep 2. Partition the Rest of the Atoms.␈↓β
The atom chosen for the centroid is removed from the composition. One
of the valences of the centroid is to remain free. Therefore, the
rest of the atoms in the composition are partitioned into less than
or equal to (valence of centroid - 1) parts. Of course, each part
should satisfy the restriction on univalents (equation 7) but for
constructing radicals there is no restriction on the size of the
parts.

␈↓αStep 3. Form Radicals from Each Part.␈↓β
The procedure to construct radicals is freshly invoked on each part
thus constructing radicals. Each part in Figure 5 gives rise to only
one radical, each arising from special case 2.

␈↓αStep 4. Combine Radicals in Each Part.␈↓β
For the example in Figure 5, each part yields only one radical. In a
more general situation, where the rest of the composition after
selection of a centroid, is partitioned into several parts, and where
each part yields several radicals, the radicals are combined to
determine all unique combinations of radicals.

␈↓αStep 5. Label Central Atom with Radicals.␈↓β
If the center is an atom (not a ring-superatom) then each unique
combination defines a single unique radical.

If the center is a ring-superatom, the radicals are determined by
labelling the center with a set of labels which includes;I) the
radicals; ii) a leading radical-valence; iii) an adequate number of
hydrogens to make up the remaining free valences of the ring-
superatom. One selection of center gives one radical and the other
gives two more, to complete a list of three radicals for the example
chosen (Fig. 5).

␈↓αSummary␈↓β

For the example chosen to illustrate the operation of the acyclic
generator, twelve isomers are generated, nine shown in Figure 3 and
three shown in Figure 4.